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ABSTRACT 

To facilitate the analysis of large-scale high- 
throughput capillary electrophoresis data, we previ- 
ously proposed a suite of efficient analysis software 
named HiTRACE (High Throughput Robust Analysis 
of Capillary Electrophoresis). HiTRACE has been 
used extensively for quantitating data from RNA 
and DNA structure mapping experiments, including 
mutate-and-map contact inference, chromatin foot- 
printing, the Eterna RNA design project and other 
high-throughput applications. However, HiTRACE 
is based on a suite of command-line MATLAB 
scripts that requires nontrivial efforts to learn, use 
and extend. Here, we present HiTRACE-Web, an 
online version of HiTRACE that includes standard 
features previously available in the command-line 
version and additional features such as automated 
band annotation and flexible adjustment of annota- 
tions, all via a user-friendly environment. By making 
use of parallelization, the on-line workflow is also 
faster than software implementations available to 
most users on their local computers. Free access: 
http://hitrace.org. 

INTRODUCTION 

Capillary electrophoresis (CE) is one of the most powerful 
and widely used nucleic acid separation techniques avail- 
able (1). Its conventional application areas include 
genomic mapping, forensic identification and genome 
sequencing (2,3). Recently combined with chemical 
probing methodologies, CE further provides a powerful 
and rapid means to map complex DNA and RNA struc- 
tures at single-nucleotide resolution (4). 

In chemical-probing-based RNA structure inference 
studies, a chemical reagent modifies the RNA of interest, 



either cleaving it or forming a covalent adduct with it. 
Commonly used reagents include hydroxyl radicals (5,6), 
dimethyl sulfate alkylation (DMS) (7), carbodiimide 
modification (CMCT) (8) and the SHAPE strategy using 
2'-OH acylation (9). Subsequent reverse transcription 
detects the modification sites as stops to primer extension 
at nucleotide resolution. Traditionally, the resulting 
cDNA fragments were resolved in sequencing gels 
followed by individually quantifying band intensities. To 
resolve these fragments in a high-throughput fashion, CE 
can be used. CE-based chemical probing can produce tens 
of thousands of individual electrophoretic bands from a 
single experiment, leading to recent breakthroughs in 
high-throughput nucleic acid structure mapping: e.g. auto- 
mated modeling of complex RNA structures (4,10-13) 
such as ribosomes (14) and viruses (15,16). 

Analyzing a large number of electrophoretic traces from 
a high-throughput structure-mapping experiment is time- 
consuming and poses a significant informatics challenge. 
It requires a set of robust signal-processing algorithms for 
accurate quantification of the structural information 
embedded in the noisy traces. Current software methods 
for CE analysis include capillary automated footprinting 
analysis (CAFA) (10), ShapeFinder (11), high-throughput 
robust analysis for CE (HiTRACE) (17), fast analysis of 
SHAPE traces (FAST) (18) and QuShape (19). The most 
recent of these methods have largely converged on the 
basic steps of analysis: preprocessing (such as selection 
of the data-containing range and baseline adjustment), de- 
convolution of co-loaded mapping signal traces and refer- 
ence traces, alignment, peak detection, band annotation, 
peak fitting, signal decay and background subtraction. 

Although these programs are all useful for semi-auto- 
mated CE data analysis, they suffer from certain limita- 
tions. CAFA has effective peak-fitting capabilities but 
lacks alignment and annotation features, thus 
necessitating laborious efforts for analyzing multiple 
capillaries. ShapeFinder and QuShape provide 
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Figure 1. Flowchart of HiTRACE-Web analysis pipeline. Each of the steps marked with an asterisk corresponds to a tab in the web server 
implementation. The first stage is preprocessing in which the user can confirm or refine the signal and the reference channels, regions of interest 
(ROI), and the correction of constant baselines. HiTRACE-Web carries out profile alignment in the second stage. Both linear (for within-batch and 
between-batch alignment) and piece-wise-linear alignment is performed. After alignment, the user can check the intermediate result, provide sequence 
and structure information and specify types of chemical reagents in the third stage. In the next stage, HiTRACE-Web provides an automated band 
annotation functionality, which allows users to complete initial assignments of hundreds of bands in hundreds of profiles in the order of seconds. 
HiTRACE-Web then performs peak fitting to approximate a profile as a sum of Gaussian curves. 



sophisticated signal alignment, annotation and peak 
fitting capabilities but have limited cross-capillary 
analysis capabilities, making these tools less efficient for 
analyzing data from large-scale experiments involving 
hundreds of capillaries. FAST is highly optimized for 
rapid band annotation but relies on the proprietary 
Applied Biosystems (ABI) utility programs, which has 
made it difficult to customize and extend to various ex- 
perimental scenarios. HiTRACE is feature rich and has 
been extensively used for high-throughput structure 
mapping studies such as the mutate-and-map strategy 
(13,20,21), chromatin footprinting and the massively 
parallel RNA design project Eterna (22). However, 
HiTRACE is a suite of command-line MATLAB scripts 
that requires non-trivial efforts to learn, use and extend, 
and a purchased license (The MathWorks, http://www. 
mathworks . com ) . 

HiTRACE-Web is based on the HiTRACE workflow 
and thus inherits all of its core functionalities for high- 
throughput CE analysis. Going one step further, 
HiTRACE-Web provides an integrated and interactive 
on-line interface. In addition to the standard features pre- 
viously available, HiTRACE-Web presents additional 
features such as automated band annotation and adjust- 
ment. By making use of multi-core server processors, 
HiTRACE-Web also runs faster than the previous off- 
line implementations available to most users on their 
local computers. To the best of the authors' knowledge, 
HiTRACE-Web is the first on-line tool for high-through- 
put CE data analysis. 

METHOD OVERVIEW 

HiTRACE-Web takes as input a set of nucleic acid struc- 
ture mapping profiles obtained from a number of 



capillaries. A profile represents the intensity of a nucleic 
acid sample in a capillary as a function of electrophoretic 
time. The peaks in a profile appear as bands in a gray-scale 
image. Band/peak locations represent individual residues 
of a nucleic acid sequence. Commonly used ABI sequen- 
cers (Life Technologies Corporation, Carlsbad, USA) can 
separate the products of hundreds of nucleotides, and we 
assume that each CE profile contains hundreds of bands. 
The final output of HiTRACE-Web is a set of aligned and 
annotated profiles with quantified band areas in text and 
image formats. To deliver the output, HiTRACE-Web 
works in multiple stages interleaved with user checkpoints: 
preprocessing, profile alignment, band annotation, peak 
fitting and quantification. Figure 1 shows the flowchart 
of the analysis pipeline. 

In preprocessing, the user needs to provide information 
on capillary channel compositions. To facilitate the 
analysis process, it is common in typical CE experiments 
to co-load samples with a reference ladder, which fluor- 
esces in a different color. Multiple spectral channels thus 
exist in each capillary, and the user specifies which 
channels contain the structure mapping signal and the ref- 
erence (typically a ladder) (Step A.l). 

The next step in preprocessing is to select the range of 
analysis, as typical profiles carry information only within a 
subset of the entire electrophoresis run. HiTRACE-Web 
uses edge-detection techniques (23) to select proper 
regions for further analysis automatically (Step A.2). It 
is also possible for the user to set the region manually. 
HiTRACE-Web then carries out a series of additional 
preprocessing operations on the selected region of 
capillaries such as adjustment of constant baseline (Step 
A.3). More details of the preprocessing operations can be 
found in (17). 
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The second stage in HiTRACE-Web is alignment. 
Profiles need to be aligned to each other because the 
products in different capillaries are subject to slightly dif- 
ferent electrophoretic conditions, and their detection times 
are then shifted and scaled compared with each other. 
Owing to the finite number of capillaries, an ABI sequen- 
cer can handle at a time (e.g. 96 for ABI3730), CE experi- 
ments using hundreds of capillaries consist of multiple 
batches. We designate the first capillary in each batch as 
the reference and use it for within-batch alignment. There, 
we align each profile to the reference by finding the 
optimal shift and scaling amounts that maximize its cor- 
relation to the reference (part of Step B.l). After the 
within-batch alignment, we carry out additional alignment 
procedures: between-batch adjustment for global align- 
ment (part of Step B.l), adjustment of smooth baseline 
(Step B.2) and dynamic-programming-based piece- wise- 
linear alignment for fine-tuning local disagreements (17) 
(Step B.3). 

After alignment, the user checks intermediate results 
(Step C.l) and specifies the sequence probed and the 
type of chemical modifications applied to each capillary 
(Step C.2). The user can currently select among nine 
options: three types of chemical reagents (DMS, CMCT 
and SHAPE), four types of dideoxynucleotides (ddGTP, 
ddATP, ddTTP and ddCTP), no modification or 'other'. 
HiTRACE-Web provides a convenient graphical interface 
for selecting signal and reference channels and specifying 
capillary modification types. This specification provides 
HiTRACE-Web with information on where bands 
should appear in the data, e.g. primarily at A and C pos- 
itions for DMS modification of RNA data (24), and with 
annotations carried forward to the final result files (see 
later in the text). 

Following the profile alignment is the band annotation 
procedure (Step D.l), which refers to the process of 
mapping each band in an electrophoretic profile to a 
position in the nucleic acid sequence. For verification, 
visual inspection of band annotation results is normally 
inevitable to certain extent, but the manual band annota- 
tion step takes significant human efforts when there are a 
large number of profiles. To address this issue, 
HiTRACE-Web provides an automated band annotation 
functionality, which allows users to complete initial as- 
signments of hundreds of bands in hundreds of profiles 
in the order of seconds. More details of the automated 
band assignment algorithm are beyond the scope of this 
server-focused paper and will be described elsewhere. 
HiTRACE-Web also provides an interactive interface to 
adjust the band annotation manually so that users can 
correct any suboptimal assignments. 

In the last stage, HiTRACE-Web performs peak fitting 
(Step D.2) to approximate a profile as a sum of Gaussian 
curves, each of which is centered at the intensity peak 
location. The peak amplitudes are selected in the least- 
squares sense by using a standard optimization technique, 
thus minimizing the deviations of the Gaussian model 
from the CE profiles (17). As the final output, 
HiTRACE-Web reports the peak quantification results 
including the area and location of each band and 
exports data annotations, sequence and structure to an 



RNA data (RDAT) formatted file (25). The RDAT file 
can then be submitted to the RNA Mapping Database 
repository for data sharing (http://rmdb.stanford.edu/ 
submit) or structure server for secondary structure model 
estimation (http://rrmdb.stanford.edu/structureserver) (25) 
and links to these tools are provided. 

HiTRACE-Web leaves correction for attenuation of 
band intensity for longer products ('signal decay'), back- 
ground subtraction and final normalization to the user. 
These post-processing tasks are carried out differently by 
independent groups who have made distinct ad hoc as- 
sumptions (19,26,27), and these tasks are left out of 
some applications, such as the mutate-and-map technique 
(21), owing to the introduction of noise. Robust experi- 
mental and computational approaches for post-processing 
chemical mapping data are in development (T. Mann, PC, 
RD, in preparation), as are additional features including 
secondary structure display and error estimation in auto- 
matic sequence assignment. Inclusion into HiTRACE- 
Web will allow these advances to be disseminated 
rapidly to the RNA structure mapping community. 



WEB SERVER 

The HiTRACE-Web server uses the Apache HTTP Server 
(The Apache Software Foundation, http://httpd. apache, 
org/) for basic web services and the MySQL Server 
(Oracle Corporation, http://www.mysql.com/) for 
internal data management. For client-side programming, 
we used Codelgniter (EllisLab, http://ellislab.com/ 
codeigniter), an open-source web application framework 
and coded the web pages in PHP (The PhP Group, http:// 
php.net/) and JavaScript (Mozilla Foundation, https://de 
veloper.mozilla.org/en/docs/JavaScript) with jQuery (The 
jQuery Foundation, http://jquery.com/) and jQuery user- 
interface plugins. Interactive user-interface components 
also use the canvas elements defined in the HTML5 
standard (World Wide Web Consortium, http://www.w3. 
org/TR/html5/). We used Ajax [http://en.wikipedia.org/ 
wiki/Ajax_(programming)] for asynchronous data trans- 
fers between the client and server sides. On the server 
side, we used Gearman (http://gearman.org), an open- 
source application framework, to schedule multiple 
requests. Each user request is handled by a worker 
written in PHP. The worker creates template code in 
MATLAB that contains input parameters and links to 
the user data. The worker also forks the MATLAB inter- 
preter so that it can execute the template code. Most of the 
time-consuming operations are multi-threaded, and the 
current HiTRACE-Web server runs on a 48-core 
machine (four on-board AMD Opteron 6172 processors 
with 256 GB main memory; Ubuntu Linux version 3.2.0- 
29). In this environment, processing 52 capillaries (with 60 
bands per capillary) takes a few minutes including manual 
adjustments. HiTRACE-Web supports most of the widely 
used web browsers including Google Chrome (Version 
25.0 or later; http://www.google.com/chrome/), Mozilla 
Firefox (Version 19.0 or later; http:// www.mozilla.org/), 
Apple Safari (Version 6.0 or later; http:// www.apple.com/ 
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Figure 2. Overview of data analysis using HiTRACE-Web. (A) Users can select the reference and signal channels using the graphical interface. 
(B) HiTRACE-Web provides a feature to select the valid region of interest automatically. Alternatively, users can set the range manually. The red 
rectangles in the figure represent the regions of interest. (C) Before advancing to the next step, HiTRACE-Web provides a snapshot of intermediate 
results so that the user can go over the previous steps if needed. (D) HiTRACE-Web provides an intuitive graphical interface to specify the chemical 
modifications made to capillaries. For each of them, the user can select among nine color-coded options: three chemical reagents [DMS (7), CMCT 
(8), the SHAPE strategy using 2'-OH acylation (9)], reference ladders using four dideoxynucleotides (ddGTP, ddATP, ddTTP and ddCTP), no 
modification and 'other'. (E) HiTRACE-Web can carry out band annotation in an automated fashion, thus reducing the analysis time substantially. 

(continued) 
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Figure 3. Confirming consistency in band quantification results between tools and analyses. (A and B) Correlation of quantification results between 
HiTRACE-Web and HiTRACE (17) for two independent users. These plots indicate lack of any major systematic deviations induced by HiTRACE- 
Web. (C and D) HiTRACE-Web gives the same level of consistency between independent analyses as HiTRACE. The data used are from round 44 
of the massively parallel RNA design project Eterna (22). A 92 nt RNA sequence was treated in six different mapping conditions including SHAPE, 
DMS and ddTTP, giving 552 bands in total. 



safari/) and Microsoft Internet Explorer (Version 9.0 or 
later; http://windows.microsoft.com). 

Figure 2 explains the overall flow of CE data analysis 
using HiTRACE-Web. The input file is a zip-compressed 
collection of .abl or .fsa files from ABI sequencers in the 
ABIF file format. If there are multiple batches, each batch 
should be presented in a subfolder, as is the typical output 
from ABI sequencers. After uploading the input file, the 
user needs to specify the signal and reference channels. 
HiTRACE-Web provides a graphical interface for 
channel selection (Figure 2A). Next, the region of 



interest is specified either manually or automatically 
(Figure 2B). HiTRACE-Web then carries out the align- 
ment of the profiles in the selected region, following the 
procedures outlined previously. The user can visually 
inspect the intermediate result after alignment 
(Figure 2C) and go over the previous stages if needed. If 
satisfactory, the user can start specifying the type of 
chemical modification data present in each capillary 
using the graphical interface HiTRACE-Web provides 
(Figure 2D). The user can then perform band annotation 
(Figure 2E). HiTRACE-Web permits either manual or 



Figure 2. Continued 

Furthermore, HiTRACE-Web provides a user-friendly interface for users to fine-tune band annotation results manually. Red circles represent the 
residue locations. Each type of nucleotide is associated with a different color (G: green, C: cyan, U: blue and A: red). By clicking the image, user can 
select or deselect the position of individual residues. The auto-assigned bands are shown on the right side in gray for easier referencing when 
manually adjusting the assignment. (F) HiTRACE-Web reports a set of aligned and annotated profiles with quantified peak areas in the image, tab- 
delimited text and RDAT (25) formats users can download for further uses. 
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automated band annotation; in practice, both types of 
annotations complement each other: performing auto- 
mated band annotation first and then adjusting the 
result manually allows a large number of bands to be 
annotated quickly. After band annotation, HiTRACE- 
Web carries out peak fitting and quantification. The 
results are provided as downloadable images (Figure 
2F), tab-delimited text files (with quantified areas; one 
column per capillary and one row per residue) and an 
RDAT file that can be submitted to the repository and 
structure modeling server available at the RNA 
Mapping Database (25). More detailed instructions to 
each analysis stage and explanations of results are avail- 
able at the HiTRACE-Web homepage. A complete 
tutorial with example data is also available, with specific 
help at each step. 

We tested the hypothesis that the automation of the 
analysis pipeline permits high reproducibility in CE 
analysis. Figures 3A and 3B show the correlation of quan- 
tification results between HiTRACE-Web and HiTRACE 
(17) for two different users. The high level of Pearson's cor- 
relation coefficients (R 2 of 0.9996 and 0.9999; P< 0.001) 
between band intensities quantified with HiTRACE-Web, 
and those quantified with HiTRACE indicates lack of any 
major systematic deviations induced by HiTRACE-Web. 
To test the consistency in band quantification between dif- 
ferent users, we let two independent users carry out quanti- 
fication of the same data using HiTRACE-Web and 
HiTRACE, as shown in Figure 3C and D, respectively. 
Both tools resulted in excellent agreement between the in- 
dependent analyses (R 2 of 0.9993 and 0.9986; P< 0.001). 
The data used are from round 44 of the massively parallel 
RNA design project Eterna (22). The mapped sequence 
length was 92 nt, and the data describe six different 
mapping experiments including SHAPE, DMS and refer- 
ence ladders. The total number of bands was 552. 



SUMMARY 

We developed HiTRACE-Web (freely available at http:// 
hitrace.org), an online server for rapid analysis of large 
collections of profiles obtained from high-throughput CE 
experiments. Recent generations of high-throughput DNA 
and RNA structure mapping studies give hundreds of CE 
profiles each containing hundreds of bands. To isolate 
signals from such a large collection of profiles, we previ- 
ously developed a signal-processing pipeline that consists 
of multiple stages including preprocessing, alignment, an- 
notation and band quantification. HiTRACE-Web now 
enables users to follow the whole pipeline through a user- 
friendly, integrative and interactive web environment. It is 
our hope that HiTRACE-Web can contribute to large- 
scale structure inference studies based on chemical 
probing and CE separation by providing an effective and 
easily accessible data analysis framework. 
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